Brief View of Lyrics Dataset

In this notebook, the story focuses on the exploration of different genres, therefore the classifications “Other” and “Not Available” are filtered out. Also, there are two outliers in year, only the songs published since 1968 are kept.

From the plots, Rock has been the most popular genre since 1968. The music industry is more prosperous recent 15 years than before and we can see a burst during 2005-2008. However, how can we evaluate the popularity of these songs? It is unreasonable barely depending on the amount of these songs.

Taking a View from RFM

As I mentioned before, it’s not reasonable to draw a conclusion whether artists preferred to creat a song in a genre, only depending on the total amount of songs. If we treat the preference of artists producing songs in different genre as a market, then these genres will have some behavior types in the market and they can be defined as different customers. It is interesting to evaluate these customers’ values by modifying a RFM Model. The value of genres are related to the score, the genres with highest scores are most valuable to the market, which can be interpreted that they are popular among artists.

For the metrics of Modified RFM Model, R, F and M are redifined here. As a result, each genre is scored by its performance in recent years, their apperance since they are first recorded are also included. * R: the interval (year(s)) when songs reaches 5% total amount tracing back from 2016. * F: the amount of songs divided by the time interval between the first year it occurred and 2016. * M: the count of songs published in 2016.

It can be concluded that “Pop” and “Rock” are most “valuable” among genres with score 8 and 7.17 with the scale of 10. However, the “Folk” music performs poorly out of the 10 categories. Some differences among genre features may cause this phenomenon:

Length of Lyrics

In thia part, I explored the length of songs, which is the count of word of the whole lyrics in a song.

Boxplot can show the distribution characteristics directly:

The distribution of lyrics’ word numbers is skewed and there are so many outliers. We can’t drive conclusion that there are significant differences among 3 genres.

When the frequecy is limited to 300, the word frequncy looks similar between Folk and Pop. Rock music reaches its \(Max(width)\) in a lower word frequency. Therefore, word frequency may not be the direct cause of the diffreneces among these genres.

Lyrics Density

Lyrics has a unique form which is distinguished from other types of texts. The end mark is usally a carriage return, so we need to take \("\n"\), \("\r\n"\) into account for splitting text into sentences. The total amount of word can not significantly tell distinction among folk, rock and pop music, analysis is performed in this part to explore their lyrics density features, which is defined as the count of words in a sentence. Since there are some songs that has poor endmarks, therefore they can’t be splitted into independent sentences, I choosed some quantiles to explore the feature.

According to Writing Tips: 25 words rule, it is better to write less than 25 words in a sentence in an article. When we turn back to lyrics, it makes sence that we won’t expect there are too manty words in a song than an article, therefore I choosed 0.25 and 0.1 quantiles to see the lyric density features.

From the plots above, it can be concluded that pop and rock music have similar word density distributions in recent years. However, there is significant fluctuation in folk music lyric density in the last few years.

I explored a fact of folk music that when the density rises at a year, then the total amount of songs wouldn’t increase. However, There isn’t a specific pattern between number ranks and density ranks, indicating that folk music artists are not producing songs or invesgating songs (in the view of RFM Model) based on lyrics density.

Word Frequency

In this part, the word frequency is analyzed for the commonness and characteristics of Folk, Rock and Pop.

From the word clouds and word frequency bar plots, there are some most frequent words mutually exist, such as love, heart, life, lie, cold, etc. There are some words unique in folf music, such as christmas and chorus, relating to faith, and it has “die”, also related closely to death. Rock is more about feeling, talking about the world, dream and life, also writing hand, eye, heart to do these feelings. There isn’t a typical characteristic in Pop music, words are often related to love: cry and tear (maybe related to breaking up), smile (it has really a small proportion). Therefore, some of the pop songs are realted to sad love stories.

Sentiment Analysis

To drive more general analysis to make the word frequencies make sense, some sentimental words are extracted and scored.

From 1968 to 2016, Rock and Pop music shows balance between positive and negative setiment scores, while there is fluctuation of Folk music during 49 years, because of missing values in some years.

There is something common between Rock and Pop music at this time, we can see strong sentiment expreesions in recent 10 years than Folk. Combining with Modified RFM Model, there are 2 metrics directly related to time interval focusing on recent years, so sentiment might be a reason that attracts artistis. Besides, Rock tends to express negative feelings wile pop tends to express positive feelings, that is a possible reason that pop music scores higher than rock music in RFM Model.

Emotion Analysis

The overall sentiment analysis has revealed that artists are producing setimental songs (most are positive) at these years. There are differences of the sentiment score distribution among three genres, a furthur step in what exactly the type of emotions are varied is discussed in this part.

From the heatmap, is can be seen clearly that all three genres of songs’ emotion increases from suprise to anger for most of the situation (trust has a score similar to sadness). Before 2010, folk music expresses heavy categories of emotions and mainly foucus on anger. After 2010, it shows more emotion expressions in anticipation. Comparing to folk music, rock and pop music express more in disgust, fear and joy after 2010. Besides there are also smaller proportions of sadness in rock and pop after 2010. Besides, rock music expresses more anger and anticipation than pop music. These similarities between pop and rock as well as the differences among the three genres may contribute to the score variation in Modified RFM Model.

When we see the heat plot of folk music, the transaction before and after 2008 is more clearly to detect. Trust and joy are descending.

Conclusion

From the RFM Model, if we treated different genres as products in a market, artists can show a “buying” performance characteristic in “purchasing” (actually it is producing new songs) these products. As a result, Pop and Rock are scored highest, while folk music scored lowest. To find out what might be the cause to this phenomenon, analysis is performed on word length, density, sentiment, and emotion.